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COMMENT ON WESTERGAARD'S "SCOPE AND 
METHOD OF STATISTICS." 

By Carl J. West, Ohio State University. 



It is often desired to estimate the value of a population for 
intercensal years. How is it to be done? Since a direct 
enumeration is out of the question some indirect method must 
be employed. As a basis for an estimate it may be assumed, 
for instance, that the ten year increase has proceeded by equal 
annual additions, or that there has been a constant annual 
rate of increase. Before the question of the method to be 
used can be definitely answered, the statistician must be 
familiar with the advantages and the disadvantages of each 
method. 

A second problem of importance to the statistician making 
use of census data is that of distributing the excessive 
numbers appearing at the even ages and especially at certain 
ages, at age 40 for instance. No method of reenumeration can 
assist in the solution of this problem because the error is 
essentially inherent in the only practicable methods of enumera- 
tion. It might among other possible assumptions be assumed 
that the excess at age 40 was drawn from ages 41 and 42, 
or that a graphic solution is the best. Ordinarily not until 
quite complicated methods are made use of in removing these 
and other irregularities can the census figures be said to show 
the approximate age distribution. 

It is not necessary to bring forward repeated illustrations to 
show that a statistician must be more than a mere enumerator, 
that even in the most simple statistical work some theory is 
absolutely necessary. It must be equally clear that instead of 
much of the personal and tentative methods of present day 
statistics there should be a consistent body of statistical theory 
which would be generally understood and applied. What is 
needed is more articles and more books and treatises present- 
ing the theory and methods of statistics in a simple style. 
And further and above everything else the theory and methods 
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should be presented as a distinct science — the science of statistics. 
Only when statistics is generally recognized as a science worthy 
of equal rank with physics and chemistry and biology will it 
secure the proper respect and recognition without which the 
labor of statisticians is partially in vain. 

Let us examine the article, "Scope and Method of Sta- 
tistics," with the purpose of discussing Westergaard's idea of 
the necessity for the development of an exact science of 
statistics. 

We begin with the theory of representative statistics. As a 
class economists have hesitated to make use of representative 
statistics, perhaps because little use can be made of such 
material without reference to certain very practical points of 
method. When can a selection be considered as representative, 
as typical of the whole class of individuals from which it is 
drawn? How is the significance of a variation in a sample or 
selection to be determined? are some of the questions which 
confront the statistician. "What remains to be done is 
simply to develop a theory of these representative enumera- 
tions, stating the limits of the deviations from the true propor- 
tions and showing how to approach as nearly as possible to 
the truth. In fact in many cases it will be practically im- 
possible to do without representative statistics." 

The matter of inaccuracies in the data is always one of great 
concern to the careful statistician. Note the statements of 
Dr. Westergaard: "a certain pessimism is often encountered 
among official statisticians and economists who attempt to 
draw conclusions from statistics. It is maintained that the 
inaccuracies are often so great that it is impossible to get any 
reliable results. Here, then, is another important problem 
in the theory of statistics, viz., to determine the significance 
of the inaccuracies, to state to what extent it is possible to 
draw conclusions from statistical date in spite of their imper- 
fections. Here it would be most useful to form a theory of the 
applicability of imperfect data* For it can easily be shown that 
even extremely incorrect data may under certain circumstances 
allow of perfectly safe conclusions." 

The author's discussion of the "law of error" in Section IV 

♦Italics are mine. 
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is of far reaching significance. To illustrate the nature of the 
discussion suppose of 400 patients with a certain disease 200 
have been treated with a serum and 200 have not been so 
treated. Of the first class suppose further that 125 recover 
and of the second class only 100 patients recover. Is the 
observed excess of recoveries among the patients of the first 
class sufficient to show the efficacy of the serum treatment? 
In a certain community the ratio of male to female births in 
one year is 110 to 100 instead of the more usual ratio of 105 
or so to the 100. Can one, because of this larger ratio, be 
certain that some cause is present which tends to increase the 
relative proportion of male births or is the observed excess 
merely a chance fluctuation, just as when tossing coins a run 
of heads may occur? As the basis for a fundamental theory 
of these variations Professor Westergaard advocates the use 
of the standard deviation, or "mean error," of a binomial 
series taken in connection with a normal law, or "law of error" 
as he terms it. He presents a strong case for the use of this 
method of measuring variations. It is of course true that not 
all data even approximate to the binomial law, but actual 
trial and experimentation show that even so the standard 
deviation constitutes a fairly reliable measure of variability. 
Altogether there appears to be a no more adaptable basis for a 
general theory of variability than that furnished by the stand- 
ard deviation. 

The idea of the author that statistical data can be broken 
up into a series of normal distributions is of extremely high 
importance as a theoretical basis for the study of a statistical 
distribution. The analysis of a given set of statistical measure- 
ments by identifying the component variations and values 
as constituent, normal distributions is one of the most 
significant ways imaginable for extracting information from 
stubborn data. It gives an arithmetic background and 
general basis for the work. The statistician with the theory 
of the normal distribution in mind feels more confident, feels 
that he has a standardized viewpoint, so to speak. Probably 
no statistical distribution can be said to have yielded all its 
information until it is broken up into the constituent "laws 
of error," or normal distributions. It may here be of interest 



61] Westergaard's "Scope and Method of Statistics." 285 

to define, in statistical terminology, a normal distribution. 
A series of measurements is said to obey the normal law when each 
measurement, X, is equal to the sum of a relatively large number of 
smaller elements where each constituent element has a value chosen 
at random* Thus the height of a person is the sum of the 
lengths of certain bones, of the widths of certain cartilages, 
and each of these elemental values is, within rather narrow 
limits, about as likely to take one value as another. Hence it 
would be anticipated that a distribution of height would be 
normal. The just mentioned value to statistics of basing the 
theory on a normal distribution is not lessened by the fact 
that in practice it is ordinarily not easy to split a distribution 
into its constituent normal distributions. 

The author shows the true instinct of the statistician when he 
says at one place " On the whole we may here lay down the 
principle that this interpolation first of all should keep as near 
real life as is possible, taking everything into consideration which 
might be of importance" ;f at another place, "The best means 
of treating the changes of a variable in a series of observations 
is the method used by Daniell Bernoulli and Duvillard. They 
used the fiction that all variations were continuous so that it 
was possible to use the differential calculus. . . . This 
method is of great value as it simplifies many problems ex- 
tremely, f . . . The main difficulty at present seems rather 
to consist in providing sufficiently complete and correct ob- 
servations than in dealing with these materials afterwards." 

The tendency of the author to systematize, to reduce to 
general principles, is shown by the sentence, "Still more com- 
plicated are the problems of economic statistics for here we 
frequently have one dimension more,\ we have to deal not only 
with numbers of quantities but with their value in money as 
well." 

Again, "Here we have many methods of procedure. In 
determining which method is the best it is a good principle always 
to keep as near as possible to the original observations, never 
leaving them out of sight, with as few intermediate links as 
possible." 

* Edgeworth, " Law of Error," Cambridge Phil. Trans., 1905. 
t Italics are mine. 
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On the formula of correlation we find, "The formula of 
correlation* will prove useful in all cases where the points are 
grouped nearly around a straight line; . . . still we must 
not forget that this formula removes us somewhat from the original 
data and that it does not relieve us from the necessity of making 
a close investigation of these observations. On the whole the 
formula of correlation does not introduce any new principle; 
by tabulating and grouping the observations we can easily 
establish as a rule the fact of correlation without the use of the 
formula." The author does not mention the value of a 
diagram — a scatter or correlation diagram, and he is apparently 
not familiar with the properties of the correlation ratio, but 
his viewpoint is fundamentally that of seeking the underlying 
principles of the science and of discussing each proposed method 
with reference to those principles. 

Again with the same purposes in mind he writes with ref- 
erence, apparently, to Professor Pearson's elaborate methods 
for determining correlation for non-measurable characteristics 
such as hair color, disposition, material wealth, and so on, 
"but at present we are more in need of statistical data than of 
theoretical investigations. This may sound curious in view of 
the immense mass of details which are published every day 
by the numerous statistical institutions all over the world, 
but to a great extent all these reports are repetitions, so to 
speak, of older investigations, most of them made in a single 
mold. ... Of course we cannot do without these myriads 
of statistical volumes, they have, at least the greatest bulk of 
them, their local claim to exist; — but beyond these reports 
numerous problems are waiting to be solved and it will re- 
quire much patience and much careful work in gathering the 
necessary materials." 

It would perhaps be better to say that there is equal 
need of statistical data and theoretical investigations. For 
until the methods are clearly in mind the search for data is 
likely to continue to be a more or less haphazard groping in 
the dark. There must be some model, some plan, for the 
collection of data and there is nothing which can give a greater 

♦The coefficient of correlation alone is referred to here. C. J. W. 
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unity to the gathering of the material than can generally 
accepted and applied methods. 

In conclusion it may be said that Dr. Westergaard in this 
article shows a thorough appreciation of the needs and diffi- 
culties of practical statistical work together with that scientific 
sympathy and spirit which makes all his work count towards 
the development of statistics into a distinct methodological 
science. 



